AITopics | sequence length and model dimension

Collaborating Authors

sequence length and model dimension

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Monarch Mixer: A Simple Sub-Quadratic GEMM-Based Architecture

Neural Information Processing SystemsDec-27-2025, 05:28:30 GMT

Machine learning models are increasingly being scaled in both sequence length and model dimension to reach longer contexts and better performance.

monarch mixer, sequence length and model dimension, simple sub-quadratic gemm-based architecture, (5 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.75)

Add feedback

Monarch Mixer: A Simple Sub-Quadratic GEMM-Based Architecture

Neural Information Processing SystemsJan-20-2025, 02:37:44 GMT

Machine learning models are increasingly being scaled in both sequence length and model dimension to reach longer contexts and better performance. We ask: are there performant architectures that can scale sub-quadratically along sequence length and model dimension? We introduce Monarch Mixer (M2), a new architecture that uses the same sub-quadratic primitive along both sequence length and model dimension: Monarch matrices, a simple class of expressive structured matrices that captures many linear transforms, achieves high hardware efficiency on GPUs, and scales sub-quadratically. As a proof of concept, we explore the performance of M2 in three domains: non-causal BERT-style language modeling, ViT-style image classification, and causal GPT-style language modeling. For non-causal BERT-style modeling, M2 matches BERT-base and BERT-large in downstream GLUE quality with up to 27% fewer parameters, and achieves up to 9.1 \times higher throughput at sequence length 4K.

monarch mixer, sequence length and model dimension, simple sub-quadratic gemm-based architecture, (4 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.77)

Add feedback

Unlocking the Secrets of Linear Complexity Sequence Model from A Unified Perspective

Qin, Zhen, Shen, Xuyang, Li, Dong, Sun, Weigao, Birchfield, Stan, Hartley, Richard, Zhong, Yiran

arXiv.org Artificial IntelligenceMay-27-2024

We present the Linear Complexity Sequence Model (LCSM), a comprehensive solution that unites various sequence modeling techniques with linear complexity, including linear attention, state space model, long convolution, and linear RNN, within a single framework. The goal is to enhance comprehension of these models by analyzing the impact of each component from a cohesive and streamlined viewpoint. Specifically, we segment the modeling processes of these models into three distinct stages: Expand, Oscillation, and Shrink (EOS), with each model having its own specific settings. The Expand stage involves projecting the input signal onto a high-dimensional memory state. This is followed by recursive operations performed on the memory state in the Oscillation stage. Finally, the memory state is projected back to a low-dimensional space in the Shrink stage. We perform comprehensive experiments to analyze the impact of different stage settings on language modeling and retrieval tasks. Our results show that data-driven methods are crucial for the effectiveness of the three stages in language modeling, whereas hand-crafted methods yield better performance in retrieval tasks.

activation function, sequence length, sequence modeling, (9 more...)

arXiv.org Artificial Intelligence

2405.17383

Country:

North America > United States (0.15)
Europe > France (0.04)
Asia > China > Shanghai > Shanghai (0.04)

Genre: Research Report > New Finding (0.86)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.49)

Add feedback